
Alibaba Cloud · Chat / LLM · 80B Parameters (3.9B Active) · 256K Context

Streaming Reasoning Chain-of-Thought Long Context Code Agentic PlanningOverview
Qwen3 Next 80B A3B Thinking is a next-generation foundation model from Alibaba’s Qwen team featuring a revolutionary Hybrid Attention mechanism (Gated DeltaNet + Gated Attention) with High-Sparsity MoE architecture. With 80B total parameters and only 3.9B active per token, it delivers 10x higher throughput than Qwen3-32B on long contexts while outperforming Gemini-2.5-Flash-Thinking on multiple benchmarks. Designed exclusively for deep reasoning tasks, it operates in thinking-only mode — surfacing full chain-of-thought traces before every response. Served instantly via the Qubrid AI Serverless API.🧠 10x throughput vs Qwen3-32B. Outperforms Gemini-2.5-Flash-Thinking. 3.9B active parameters. Deploy on Qubrid AI — no infrastructure required.
Model Specifications
| Field | Details |
|---|---|
| Model ID | Qwen/Qwen3-Next-80B-A3B-Thinking |
| Provider | Alibaba Cloud (Qwen Team) |
| Kind | Chat / LLM |
| Architecture | Hybrid Transformer-Mamba (Gated DeltaNet + Gated Attention) with High-Sparsity MoE and Multi-Token Prediction (MTP) |
| Parameters | 80B total (3.9B active per token) |
| Context Length | 256,000 Tokens |
| MoE | No |
| Release Date | September 2025 |
| License | Apache 2.0 |
| Training Data | Large-scale multilingual pretraining dataset, fine-tuned with GSPO for thinking |
| Function Calling | Not Supported |
| Image Support | N/A |
| Serverless API | Available |
| Fine-tuning | Coming Soon |
| On-demand | Coming Soon |
| State | 🟢 Ready |
Pricing
💳 Access via the Qubrid AI Serverless API with pay-per-token pricing. No infrastructure management required.
| Token Type | Price per 1M Tokens |
|---|---|
| Input Tokens | $0.20 |
| Output Tokens | $1.80 |
Quickstart
Prerequisites
- Create a free account at platform.qubrid.com
- Generate your API key from the API Keys section
- Replace
QUBRID_API_KEYin the code below with your actual key
💡 Thinking mode: This model always produces chain-of-thought reasoning traces before its final answer. Plan for higher output token counts on complex tasks — use max_tokens accordingly.
Python
JavaScript
Go
cURL
Live Example
Prompt: Prove that the square root of 2 is irrational
Response:
Playground Features
The Qubrid AI Playground lets you interact with Qwen3 Next 80B Thinking directly in your browser — no setup, no code, no cost to explore.🧠 System Prompt
Define the model’s reasoning approach, domain focus, and output format before the conversation begins. Particularly powerful for mathematical proofs, agentic planning, and structured analytical tasks.Set your system prompt once in the Qubrid Playground and it applies across every turn of the conversation.
🎯 Few-Shot Examples
Guide the model’s reasoning style and output structure with concrete examples — no fine-tuning, no retraining required.| User Input | Assistant Response |
|---|---|
Is 97 a prime number? | Yes. 97 is prime. Check divisibility by all primes ≤ √97 ≈ 9.8: not divisible by 2, 3, 5, or 7. Therefore 97 has no factors other than 1 and itself. |
Write a Python function to check if a string is a palindrome | def is_palindrome(s: str) -> bool: s = s.lower().replace(" ", "") return s == s[::-1] |
💡 Stack multiple few-shot examples in the Qubrid Playground to establish your preferred reasoning format and output structure — no fine-tuning required.
Inference Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| Streaming | boolean | true | Enable streaming responses for real-time output |
| Temperature | number | 0.6 | Controls randomness. Lower values recommended for reasoning tasks |
| Max Tokens | number | 8192 | Maximum number of tokens to generate |
| Top P | number | 0.95 | Nucleus sampling parameter |
Use Cases
- Complex multi-step reasoning
- Mathematical proofs
- Code synthesis
- Logical analysis
- Agentic planning
- Long-context document analysis
Strengths & Limitations
| Strengths | Limitations |
|---|---|
| Hybrid Attention (Gated DeltaNet + Gated Attention) for efficient long-context processing | Thinking mode only — no fast non-thinking mode available |
| 10x throughput vs Qwen3-32B on 32K+ contexts | Longer thinking traces increase latency on complex tasks |
| Only 3.9B active parameters from 80B total (efficient inference) | New architecture with limited community tooling support |
| Native 256K context window | Function calling not supported |
| Outperforms Gemini-2.5-Flash-Thinking on multiple benchmarks | |
| Apache 2.0 — fully open-source with commercial use |
Why Qubrid AI?
- 🚀 No infrastructure setup — 80B MoE served serverlessly, pay only for what you use
- 🔁 OpenAI-compatible — drop-in replacement using the same SDK, just swap the base URL
- 🧠 Reasoning at scale — Qwen3 Next’s 10x throughput advantage is fully realized on Qubrid’s low-latency infrastructure
- 🧪 Built-in Playground — prototype with system prompts and few-shot examples instantly at platform.qubrid.com
- 📊 Full observability — API logs and usage tracking built into the Qubrid dashboard
- 🌐 Multi-language support — Python, JavaScript, Go, cURL out of the box
Resources
| Resource | Link |
|---|---|
| 📖 Qubrid Docs | docs.platform.qubrid.com |
| 🎮 Playground | Try Qwen3 Next 80B Thinking live |
| 🔑 API Keys | Get your API Key |
| 🤗 Hugging Face | Qwen/Qwen3-Next-80B-A3B-Thinking |
| 💬 Discord | Join the Qubrid Community |
Built with ❤️ by Qubrid AI
Frontier models. Serverless infrastructure. Zero friction.
Frontier models. Serverless infrastructure. Zero friction.